A Literature Survey on Domain Adaptation of Statistical Classifiers

نویسنده

  • Jing Jiang
چکیده

The domain adaptation problem, especially domain adaptation in natural language processing, started gaining much attention very recently [Daumé III and Marcu, 2006, Blitzer et al., 2006, Ben-David et al., 2007, Daumé III, 2007, Satpal and Sarawagi, 2007]. However, some special kinds of domain adaptation problems have been studied before under different names such as class imbalance [Japkowicz and Stephen, 2002], covariate shift [Shimodaira, 2000], and sample selection bias [Heckman, 1979]. There are also some well-studied machine learning problems that are closely related but not equivalent to domain adaptation, including multi-task learning [Caruana, 1997] and semi-supervised learning [Chapelle et al., 2006]. In this literature survey, we review existing work in both the machine learning and the natural language processing communities related to domain adaptation. Because this relatively new topic is constantly attracting attention, our survey is necessarily incomplete. Nevertheless, we try to cover the major lines of work that we are aware of up to the date this survey is written. This survey will also be constantly updated. The goal of this literature survey is twofold. First, existing studies on domain adaptation seem very different from each other, and different terms are used to refer to the problem. There has not been any survey that connects these different studies. This survey thus tries to organize the existing work in a systematic way and draw a big picture of the domain adaptation problem with its possible solutions. Second, a systematic literature survey shows the limitations of current work and points out promising directions that should be explored.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning for Cross-domain Sentiment Classification

In the literature, various approaches have been proposed to address the domain adaptation problem in sentiment classification (also called cross-domain sentiment classification). However, the adaptation performance normally much suffers when the data distributions in the source and target domains differ significantly. In this paper, we suggest to perform active learning for cross-domain sentime...

متن کامل

Domain Adaptation for Statistical Classifiers

The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the “in-domain” test data is drawn from a distribution that is related, but not identical, to the “out-of-domain” distribution of the training data. We consider the common case in which labeled out-of-domain data ...

متن کامل

Bagging-based System Combination for Domain Adaptation

Domain adaptation plays an important role in multi-domain SMT. Conventional approaches usually resort to statistical classifiers, but they require annotated monolingual data in different domains, which may not be available in some cases. We instead propose a simple but effective bagging-based approach without using any annotated data. Large-scale experiments show that our new method improves tr...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Bootstrapping polarity classifiers with rule-based classification

In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rulebased classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007